Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38739513

RESUMO

In the real world, data distributions often exhibit multiple granularities. However, the majority of existing neighbor-based machine-learning methods rely on manually setting a single-granularity for neighbor relationships. These methods typically handle each data point using a single-granularity approach, which severely affects their accuracy and efficiency. This paper adopts a dual-pronged approach: it constructs a multi-granularity representation of the data using the granular-ball computing model, thereby boosting the algorithm's time efficiency. It leverages the multi-granularity representation of the data to create tailored, multi-granularity neighborhood relationships for different task scenarios, resulting in improved algorithmic accuracy. The experimental results convincingly demonstrate that the proposed multi-granularity neighbor relationship effectively enhances KNN classification and clustering methods. The source code has been publicly released and is now accessible on GitHub at https://github.com/xjnine/MGNR.

2.
Artigo em Inglês | MEDLINE | ID: mdl-37943647

RESUMO

Pawlak rough set (PRS) and neighborhood rough set (NRS) are the two most common rough set theoretical models. Although the PRS can use equivalence classes to represent knowledge, it is unable to process continuous data. On the other hand, NRSs, which can process continuous data, rather lose the ability of using equivalence classes to represent knowledge. To remedy this deficit, this article presents a granular-ball rough set (GBRS) based on the granular-ball computing combining the robustness and the adaptability of the granular-ball computing. The GBRS can simultaneously represent both the PRS and the NRS, enabling it not only to be able to deal with continuous data and to use equivalence classes for knowledge representation as well. In addition, we propose an implementation algorithm of the GBRS by introducing the positive region of GBRS into the PRS framework. The experimental results on benchmark datasets demonstrate that the learning accuracy of the GBRS has been significantly improved compared with the PRS and the traditional NRS. The GBRS also outperforms nine popular or the state-of-the-art feature selection methods. We have open-sourced all the source codes of this article at http://www.cquptshuyinxia.com/GBRS.html, https://github.com/syxiaa/GBRS.

3.
Artigo em Inglês | MEDLINE | ID: mdl-37566496

RESUMO

Density peaks clustering algorithm (DP) has difficulty in clustering large-scale data, because it requires the distance matrix to compute the density and δ -distance for each object, which has O(n2) time complexity. Granular ball (GB) is a coarse-grained representation of data. It is based on the fact that an object and its local neighbors have similar distribution and they have high possibility of belonging to the same class. It has been introduced into supervised learning by Xia et al. to improve the efficiency of supervised learning, such as support vector machine, k -nearest neighbor classification, rough set, etc. Inspired by the idea of GB, we introduce it into unsupervised learning for the first time and propose a GB-based DP algorithm, called GB-DP. First, it generates GBs from the original data with an unsupervised partitioning method. Then, it defines the density of GBs, instead of the density of objects, according to the centers, radius, and distances between its members and centers, without setting any parameters. After that, it computes the distance between the centers of GBs as the distance between GBs and defines the δ -distance of GBs. Finally, it uses GBs' density and δ -distance to plot the decision graph, employs DP algorithm to cluster them, and expands the clustering result to the original data. Since there is no need to calculate the distance between any two objects and the number of GBs is far less than the scale of a data, it greatly reduces the running time of DP algorithm. By comparing with k -means, ball k -means, DP, DPC-KNN-PCA, FastDPeak, and DLORE-DP, GB-DP can get similar or even better clustering results in much less running time without setting any parameters. The source code is available at https://github.com/DongdongCheng/GB-DP.

4.
Artigo em Inglês | MEDLINE | ID: mdl-37027748

RESUMO

Due to simplicity, K-means has become a widely used clustering method. However, its clustering result is seriously affected by the initial centers and the allocation strategy makes it hard to identify manifold clusters. Many improved K-means are proposed to accelerate it and improve the quality of initialize cluster centers, but few researchers pay attention to the shortcoming of K-means in discovering arbitrary-shaped clusters. Using graph distance (GD) to measure the dissimilarity between objects is a good way to solve this problem, but computing the GD is time-consuming. Inspired by the idea that granular ball uses a ball to represent the local data, we select representatives from a local neighborhood, called natural density peaks (NDPs). On the basis of NDPs, we propose a novel K-means algorithm for identifying arbitrary-shaped clusters, called NDP-Kmeans. It defines neighbor-based distance between NDPs and takes advantage of the neighbor-based distance to compute the GD between NDPs. Afterward, an improved K-means with high-quality initial centers and GD is used to cluster NDPs. Finally, each remaining object is assigned according to its representative. The experimental results show that our algorithms can not only recognize spherical clusters but also manifold clusters. Therefore, NDP-Kmeans has more advantages in detecting arbitrary-shaped clusters than other excellent algorithms.

5.
Artigo em Inglês | MEDLINE | ID: mdl-37058387

RESUMO

Hierarchical quotient space structure (HQSS), as a typical description of granular computing (GrC), focuses on hierarchically granulating fuzzy data and mining hidden knowledge. The key step of constructing HQSS is to transform the fuzzy similarity relation into fuzzy equivalence relation. However, on one hand, the transformation process has high time complexity. On the other hand, it is difficult to mine knowledge directly from fuzzy similarity relation due to its information redundancy, i.e., sparsity of effective information. Therefore, this article mainly focuses on proposing an efficient granulation approach for constructing HQSS by quickly extracting the effective value of fuzzy similarity relation. First, the effective value and effective position of fuzzy similarity relation are defined according to whether they could be retained in fuzzy equivalence relation. Second, the number and composition of effective values are presented to confirm that which elements are effective values. Based on these above theories, redundant information and sparse effective information in fuzzy similarity relation could be completely distinguished. Next, both isomorphism and similarity between two fuzzy similarity relations are researched based on the effective value. The isomorphism between two fuzzy equivalence relations is discussed based on the effective value. Then, the algorithm with low time complexity for extracting effective values of fuzzy similarity relation is introduced. On the basis, the algorithm for constructing HQSS is presented to realize efficient granulation of fuzzy data. The proposed algorithms could accurately extract effective information from the fuzzy similarity relation and construct the same HQSS with the fuzzy equivalence relation while greatly reducing the time complexity. Finally, relevant experiments on 15 UCI datasets, 3 UKB datasets, and 5 image datasets are shown and analyzed to verify the effectiveness and efficiency of the proposed algorithm.

6.
Neural Netw ; 157: 364-376, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36403345

RESUMO

Learning graph embeddings for high-dimensional data is an important technology for dimensionality reduction. The learning process is expected to preserve the discriminative and geometric information of high-dimensional data in a new low-dimensional subspace via either manual or automatic graph construction. Although both manual and automatic graph constructions can capture the geometry and discrimination of data to a certain degree, they working alone cannot fully explore the underlying data structure. To learn and preserve more discriminative and geometric information of the high-dimensional data in the low-dimensional subspace as much as possible, we develop a novel Discriminative and Geometry-Preserving Adaptive Graph Embedding (DGPAGE). It systematically integrates manual and adaptive graph constructions in one unified graph embedding framework, which is able to effectively inject the essential information of data involved in predefined graphs into the learning of an adaptive graph, in order to achieve both adaptability and specificity of data. Learning the adaptive graph jointly with the optimized projections, DGPAGE can generate an embedded subspace that has better pattern discrimination for image classification. Results derived from extensive experiments on image data sets have shown that DGPAGE outperforms the state-of-the-art graph-based dimensionality reduction methods. The ablation studies show that it is beneficial to have an integrated framework, like DGPAGE, that brings together the advantages of manual/adaptive graph construction.


Assuntos
Algoritmos , Reconhecimento Automatizado de Padrão , Reconhecimento Automatizado de Padrão/métodos , Aprendizagem
7.
IEEE Trans Neural Netw Learn Syst ; 34(4): 2144-2155, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34460405

RESUMO

This article presents a general sampling method, called granular-ball sampling (GBS), for classification problems by introducing the idea of granular computing. The GBS method uses some adaptively generated hyperballs to cover the data space, and the points on the hyperballs constitute the sampled data. GBS is the first sampling method that not only reduces the data size but also improves the data quality in noisy label classification. In addition, because the GBS method can be used to exactly describe the boundary, it can obtain almost the same classification accuracy as the results on the original datasets, and it can obtain an obviously higher classification accuracy than random sampling. Therefore, for the data reduction classification task, GBS is a general method that is not especially restricted by any specific classifier or dataset. Moreover, the GBS can be effectively used as an undersampling method for imbalanced classification. It has a time complexity that is close to O( N ), so it can accelerate most classifiers. These advantages make GBS powerful for improving the performance of classifiers. All codes have been released in the open source GBS library at http://www.cquptshuyinxia.com/GBS.html.

8.
Artigo em Inglês | MEDLINE | ID: mdl-36197862

RESUMO

Granular-ball computing (GBC) is an efficient, robust, and scalable learning method for granular computing. The granular ball (GB) generation method is based on GB computing. This article proposes a method for accelerating GB generation using division to replace k -means. It can significantly improve the efficiency of GB generation while ensuring an accuracy similar to that of the existing methods. In addition, a new adaptive method for GB generation is proposed by considering the elimination of the GB overlap and other factors. This makes the GB generation process parameter-free and completely adaptive in the true sense. In addition, this study first provides mathematical models for the GB covering. The experimental results on some real datasets demonstrate that the two proposed GB generation methods have accuracies similar to those of the existing method in most cases, while adaptiveness or acceleration is realized. All the codes were released in the open-source GBC library at http://www.cquptshuyinxia.com/GBC.html or https://github.com/syxiaa/gbc.

9.
IEEE Trans Pattern Anal Mach Intell ; 44(1): 87-99, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32750814

RESUMO

This paper presents a novel accelerated exact k-means called as "Ball k-means" by using the ball to describe each cluster, which focus on reducing the point-centroid distance computation. The "Ball k-means" can exactly find its neighbor clusters for each cluster, resulting distance computations only between a point and its neighbor clusters' centroids instead of all centroids. What's more, each cluster can be divided into "stable area" and "active area", and the latter one is further divided into some exact "annular area". The assignment of the points in the "stable area" is not changed while the points in each "annular area" will be adjusted within a few neighbor clusters. There are no upper or lower bounds in the whole process. Moreover, ball k-means uses ball clusters and neighbor searching along with multiple novel stratagems for reducing centroid distance computations. In comparison with the current state-of-the art accelerated exact bounded methods, the Yinyang algorithm and the Exponion algorithm, as well as other top-of-the-line tree-based and bounded methods, the ball k-means attains both higher performance and performs fewer distance calculations, especially for large-k problems. The faster speed, no extra parameters and simpler design of "Ball k-means" make it an all-around replacement of the naive k-means.

10.
IEEE Trans Cybern ; 52(8): 7732-7741, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-33566780

RESUMO

Image annotation aims to jointly predict multiple tags for an image. Although significant progress has been achieved, existing approaches usually overlook aligning specific labels and their corresponding regions due to the weak supervised information (i.e., "bag of labels" for regions), thus failing to explicitly exploit the discrimination from different classes. In this article, we propose the deep label-specific feature (Deep-LIFT) learning model to build the explicit and exact correspondence between the label and the local visual region, which improves the effectiveness of feature learning and enhances the interpretability of the model itself. Deep-LIFT extracts features for each label by aligning each label and its region. Specifically, Deep-LIFTs are achieved through learning multiple correlation maps between image convolutional features and label embeddings. Moreover, we construct two variant graph convolutional networks (GCNs) to further capture the interdependency among labels. Empirical studies on benchmark datasets validate that the proposed model achieves superior performance on multilabel classification over other existing state-of-the-art methods.


Assuntos
Algoritmos , Curadoria de Dados
11.
IEEE Trans Cybern ; 52(10): 10444-10457, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33909577

RESUMO

This article presents a simple sampling method, which is very easy to be implemented, for classification by introducing the idea of random space division, called "random space division sampling" (RSDS). It can extract the boundary points as the sampled result by efficiently distinguishing the label noise points, inner points, and boundary points. This makes it the first general sampling method for classification that not only can reduce the data size but also enhance the classification accuracy of a classifier, especially in the label-noisy classification. The "general" means that it is not restricted to any specific classifiers or datasets (regardless of whether a dataset is linear or not). Furthermore, the RSDS can online accelerate most classifiers because of its lower time complexity than most classifiers. Moreover, the RSDS can be used as an undersampling method for imbalanced classification. The experimental results on benchmark datasets demonstrate its effectiveness and efficiency. The code of the RSDS and comparison algorithms is available at: https://github.com/syxiaa/RSDS.


Assuntos
Conjuntos de Dados como Assunto , Algoritmos
12.
IEEE Trans Neural Netw Learn Syst ; 33(7): 2916-2930, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-33428577

RESUMO

Mitigating label noise is a crucial problem in classification. Noise filtering is an effective method of dealing with label noise which does not need to estimate the noise rate or rely on any loss function. However, most filtering methods focus mainly on binary classification, leaving the more difficult counterpart problem of multiclass classification relatively unexplored. To remedy this deficit, we present a definition for label noise in a multiclass setting and propose a general framework for a novel label noise filtering learning method for multiclass classification. Two examples of noise filtering methods for multiclass classification, multiclass complete random forest (mCRF) and multiclass relative density, are derived from their binary counterparts using our proposed framework. In addition, to optimize the NI_threshold hyperparameter in mCRF, we propose two new optimization methods: a new voting cross-validation method and an adaptive method that employs a 2-means clustering algorithm. Furthermore, we incorporate SMOTE into our label noise filtering learning framework to handle the ubiquitous problem of imbalanced data in multiclass classification. We report experiments on both synthetic data sets and UCI benchmarks to demonstrate our proposed methods are highly robust to label noise in comparison with state-of-the-art baselines. All code and data results are available at https://github.com/syxiaa/Multiclass-Label-Noise-Filtering-Learning.

13.
IEEE Trans Neural Netw Learn Syst ; 33(8): 3675-3689, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-33635795

RESUMO

Multi-scale decision system (MDS) is an effective tool to describe hierarchical data in machine learning. Optimal scale combination (OSC) selection and attribute reduction are two key issues related to knowledge discovery in MDSs. However, searching for all OSCs may result in a combinatorial explosion, and the existing approaches typically incur excessive time consumption. In this study, searching for all OSCs is considered as an optimization problem with the scale space as the search space. Accordingly, a sequential three-way decision model of the scale space is established to reduce the search space by integrating three-way decision with the Hasse diagram. First, a novel scale combination is proposed to perform scale selection and attribute reduction simultaneously, and then an extended stepwise optimal scale selection (ESOSS) method is introduced to quickly search for a single local OSC on a subset of the scale space. Second, based on the obtained local OSCs, a sequential three-way decision model of the scale space is established to divide the search space into three pair-wise disjoint regions, namely the positive, negative, and boundary regions. The boundary region is regarded as a new search space, and it can be proved that a local OSC on the boundary region is also a global OSC. Therefore, all OSCs of a given MDS can be obtained by searching for the local OSCs on the boundary regions in a step-by-step manner. Finally, according to the properties of the Hasse diagram, a formula for calculating the maximal elements of a given boundary region is provided to alleviate space complexity. Accordingly, an efficient OSC selection algorithm is proposed to improve the efficiency of searching for all OSCs by reducing the search space. The experimental results demonstrate that the proposed method can significantly reduce computational time.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...